Recognizing Referential Links: 
An Information Extraction Perspective 
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Abstract 

We present an efficient and robust reference reso- 
lution algorithm in an end-to-end state-of-the-art 
information extraction system, which must work 
with a considerably impoverished syntactic analy- 
sis of the input sentences. Considering this disad- 
vantage, the basic setup to collect, filter, then order 
by salience does remarkably well with third-person 
pronouns, but needs more semantic and discourse 
information to improve the treatments of other ex- 
pression types. 

Introduction 

Anaphora resolution is a component technology of 
an overall discourse understanding system. This 
paper focuses on reference resolution in an infor- 
mation extraction system, which performs a par- 
tial and selective 'understanding' of unrestricted 
discourses. 

Reference Resolution in IE 

An information extraction (IE) system automat- 
ically extracts certain predefined target informa- 
tion from real-world online texts or speech tran- 
scripts. The target information, typically of the 
form "who did what to whom where when," is ex- 
tracted from natural language sentences or format- 
ted tables, and fills parts of predefined template 
data structures with slot values. Partially filled 
template data objects about the same entities, en- 
tity relationships, and events are then merged to 
create a network of related data objects. These 
template data objects depicting instances of the 
target information are the raw output of IE, ready 
for a wide range of applications such as database 



updating and summary generation.]^ 

In this IE context, reference resolution takes the 
form of merging partial data objects about the 
same entities, entity relationships, and events de- 
scribed at different discourse positions. Merging 
in IE is very difficult, accounting for a significant 
portion of IE errors in the final output. This pa- 
per focuses on the referential relationships among 
entities rather than the more complex problem of 
event merging. 

An IE system recognizes particular target infor- 
mation instances, ignoring anything deemed irrel- 
evant. Reference resolution within IE, however, 
cannot operate only on those parts that describe 
target information because anaphoric expressions 
within target linguistic patterns may have an- 
tecedents outside of the target, and those that oc- 
cur in an apparently irrelevant pattern may ac- 
tually resolve to target entities. For this reason, 
reference resolution in the IE system needs ac- 
cess to all of the text rather than some selective 
parts. Furthermore, it is reasonable to assume that 
a largely domain-independent method of reference 
resolution can be developed, which need not be 
tailored anew each time a new target is defined. 

In this paper, I discuss one such entity refer- 
ence resolution algorithm for a general geo-political 
business domain developed for SRI's FASTUS"^*^ 
system ( Hobbs et al., 199^ ), one of the leading IE 
systems, which can also be seen as a representative 
of today's IE technology. 
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^The IE technology has undergone a rapid development 
in the 1990s driven by the series of Message Understand- 
ing Conferences (M UCsl in the U.S. governmen t-sponsored 
TIPSTER program (http: //www. tipster .org). 



The Input to Reference Resolution 



Multi ple top scoring sites working on IE have con 
verged on the use of finite-state linguistic patterns 
applied in stages of smaller to larger units. This 
finite-state transduction approach to IE, first intro- 
duced in SRI's FASTUS, has proven effective for 
real-world texts because full parsing is far too am- 
biguous, slow, and brittle against real-world sen- 
tences. This means that we cannot assume correct 
and full syntactic structures in the input to ref- 
erence resolution in a typical IE system. The in- 
put is a set of (often overlapping or discontiguous) 
finite-state approximations of sentence parts. We 
must approximate fine-grained theoretical propos- 
als ab out referential dependencies, and adapt them 



nize coreference among noun phrases ( pundheim 
1995| ). Note that only identity of reference was 



to tiij' prmt.pvt of sparsp and inrnmplpt.p syntartir 

input. 

The input to reference resolution in the the- 
oretical literature is assumed to be fully parsed 
sentences, often with syntactic attributes such as 
grammatical functions and thematic roles on the 
constituents ( IWebber, 197^ ; gidner, 1979| ; |Hobbs 
1978f| prosz, Joshi, and Weinstein, 1995| ). In im- 
plemented reference resolution systems, for pro- 
noun resolution in particular, there seems to be 
a trade-off between the completeness of syntac- 
tic input and the robustness with real-world sen- 
tences. In short, more robust and partial parsing 
gives us wider coverage, but less syntactic infor- 
mation also leads to less accurate reference reso- 
lution. For instance, Lappin and Leass (1994| ) re- 
port an 86% accuracy for a resolution algorithm 



for third-person pronouns using fully parsed sen- 
tences as input. Kennedy and Boguraev (19961 ) 
then report a 75% accuracy for an algorithm that 
approximates Lappin and Leass's with more robust 
and coarse-grained syntactic input. After describ- 
ing the algorithm in the next section, I will briefly 
compare the present approach with these pronoun 
resolution approaches. 

Algorithm 

This algorithm was first implemented for the 
MUC-6 FASTUS system (|Appelt et al., 1995| ), and 
produced one of the top scores (a recall of 59% 
and precision of 72%) in the MUC-6 Coreference 
Task, which evaluated systems' ability to recog- 



evaluated there.| 

The three main factors in this algorithm are (a) 
accessible text regions, (b) semantic consistency, 
and (c) dynamic syntactic preference. The algo- 
rithm is invoked for each sentence after the earlier 
finite-state transduction phases have determined 
the best sequence(s) of nominal and verbal expres- 
sions. Crucially, each nominal expression is associ- 
ated with a set of template data objects that record 
various linguistic and textual attributes of the re- 
ferring expressions contained in it. These data ob- 
jects are similar to discourse referents in discourse 
semantics ( Karttunen, 1976 ; |Kamp, 1981 ; Heim 



1982; Kamp and Reyle, 1993| ), in that anaphoric 



expressions such as she are also associated with 
corresponding anaphoric entities. A pleonastic it 
has no associated entities. Quantificational nomi- 
nals such as each company are associated with en- 
tity objects because they are 'anaphoric' to group 
entities accessible in the context. In this setup, the 
effect of reference resolution is merging of multiple 
entity objects. Here is the algorithm. 

1. INPUT: Template entities with the following 
textual, syntactic, and semantic features: 



(a) 
(b) 

(c) 
(d) 



(f) 
(g) 
(h) 



determiner type (e.g., DEE, INDEF, PRON) 

grammatical or numerical number (e.g., SG, 
PL, 3) 

head string (e.g., automaker) 

the head string sort in a sort hierarchy (e.g., 
aut omaker^ company —i'organizat ion) 

modifier strings (e.g., newly founded, with the 
pilots) 

text span (the start and end byte positions) 

sentence and paragraph position^ 

text region type (e.g., HEADLINE, TEXT) 
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Other referential relationships such as subset and part- 
whole did not reach sufficiently reliable interannotator 
agreements. Only identity of reference had a sufficiently 
high agreement rate (about 85%) between two human 
annotators. 

■^Higher text structure properties such as subsections and 
sections should also be considered if there are any. Exact 
accessibility computation using complex hierarchical text 
structures is a future topic of study. 



2. FOR EACH potentially anaphoric entity object 
in the current sentence, in the left-to-right order, 
DO 

(1) COLLECT antecedent entity objects from the 
accessible text region. 

• For an entity in a HEADLINE text region, 
the entire TEXT is accessible because the 
headline summarizes the text. 

• For an entity in a TEXT region, everything 
preceding its text span is accessible (except for 
the HEADLINE). Intrasentential cataphora 
is allowed only for first-person pronouns. 

• In addition, a locality assumption on 
anaphora sets a (soft) window of search for 
each referring expression type — the entire 
preceding text for proper names, narrower for 
definite noun phrases, even narrower for pro- 
nouns, and only the current sentence for re- 
flexives. In the MUC-6 system, the window 
size was arbitrarily set to ten sentences for 
definites and three sentences for pronouns, 
ignoring paragraph boundaries, and no an- 
tecedents beyond the limit were considered. 
This clearly left ample room for refinement .0 

(2) FILTER with semantic consistency between 
the anaphoric entity Ei and the potential an- 
tecedent entity E2. 

• Number Consistency: Eis number must be 
consistent with i?2's number — for example, 
twelve is consistent with PLURAL, but not 
with SINGULAR. As a special case, plural 
pronouns {they, we) can take singular organi- 
zation antecedents. 

• Sort Consistency: Ei^s sort must ei- 
ther EQUAL or SUBSUME E2's sort. 
This reflects a monotonicity assumption on 
anaphora — for example, since company sub- 
sumes automaker, the company can take a 
Chicago-based automaker as an antecedent, 
but it is too risky to allow the automaker 
to take a Chicago-based company as an 
antecedent. On the other hand, since 



automaker and airline are neither the same 
sort nor in a subsumption relation, an au- 
tomaker and the airline cannot corefer. (The 
system's sort hierarchy is still sparse and in- 
complete.) 

• Modifier Consistency: E^s modifiers must 
be consistent with E2^s modifiers — for ex- 
ample, French and British are inconsistent, 
but French and multinational are consistent. 
(The system doesn't have enough knowledge 
to do this well.) 
(3) ORDER by dynamic syntactic preference. The 
following ordering approximates the relative 
salience of entities. The basic underlying 
hypothesis is that intrasentential candidates 
are more salient than intersentential candi- 
dates as proposed, for example, in Hobbs| 
(197^ ) and Kameyama (in press| ), and that 



11. 



ui. 



fine-grained syntax-based salience fades with 
time. Since fine-grained syntax with gram- 
matical functions is unavailable, the syntactic 
prominence of subjects and left-dislocation is 
approximated by the left-right linear ordering. 

the preceding part of the same sentence in 
the left-right order 

the immediately preceding sentence in the 
left-right order 

other preceding sentences within the 'limit' 
(see above) in the right-left order 



''For example, in a more recent FASTUS system, para- 
graphs are also considered in setting the limit, and at most 
one candidate beyond the limit is proposed when no candi- 
dates are found within the limit. 
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3. OUTPUT: After each anaphoric entity has found 
an ordered set of potential antecedent entities, 
there are destructive (indefeasible) and nonde- 
structive (defeasible) options. 

(a) Destructive Option: MERGE the anaphoric 
entity into the preferred antecedent entity . 

(b) Nondestructive Option: RECORD the an- 
tecedent entity list in the anaphoric entity to 
allow reordering (i.e., preference revisions) by 
event merging or overall model selection. 

The MUC-6 system took the destructive op- 
tion. The nondestructive option has been im- 
plemented in a more recent system. 

These basic steps of "COLLECT, FILTER, and 
ORDER by salience" are analogous to Lappin and 



Leass's (1994) pronoun resolution algorithm, but 
each step in FASTUS relies on considerably poorer 
syntactic input. The present algorithm thus pro- 
vides an interesting case of what happens with 
extremely poor syntactic input, even poorer than 
in Kennedy and Boguraev's (1996) system. This 
comparison will be discussed later. 

Name Alias Recognition 

In addition to the above general algorithm, a 
special-purpose alias recognition algorithm is in- 
voked for coreference resolution of proper names .0 

1. INPUT: The input English text is in mixed 
cases. An earlier transduction phase has rec- 
ognized unknown names as well as specific- 
type names for persons, locations, or 
organizations using name-internal pattern 
matching and known name lists. 

2. FOR EACH new sentence, FOR EACH unknown 
name, IF it is an alias or acronym of an- 
other name already recognized in the given text, 
MERGE the two — an alias is a selective sub- 
string of the full name (e.g.. Colonial for Colo- 
nial Beef), and acronym is a selective sequence 
of initial characters in the full name (e.g., CM 
for Ceneral Motors). 

Overall Performance 

The MUC-6 FASTUS reference resolution algo- 
rithm handled only coreference (i.e., identity of ref- 
erence) of proper names, definites, and pronouns. 
These are the 'core' anaphoric expression types 
whose dependencies tend to be constrained by sur- 
face textual factors such as locality. The MUC-6 
Coreference Task evaluation included coreference 
of bare nominals, possessed nominals, and indefi- 
nites as well, which the system did not handle be- 
cause we didn't have a reliable algorithm for these 
mostly 'accidental' coreferences that seemed to re- 
quire deeper inferences. Nevertheless, the system 
scored a recall of 59% and precision of 72% in the 
blind evaluation of thirty newspaper articles. 



Expression Type 


Number of 


Correctly 




Occurrences 


Resolved 


Definites 


61 


28(46%) 


Pronouns 


39 


24(62%) 


Proper Names 


32 


22(69%) 


Reflexives 


1 


1(100%) 


TOTAL 


133 


75(56%) 



Table 1: Core Discourse Anaphors in Five Articles 



Grammatical 


Intra/Inter-S 


Number of 


Correctly 


Person 


Antecedent 


Occurrences 


Resolved 


3rd person 


intra- S 


27 


21(78%) 


3rd person 


inter-S 


6 


2(33%) 


that 


inter-S 


1 


0(0%) 


lst/2nd person 


inter-S 


5 


1(20%) 


reflexive 


intra- S 


1 


1(100%) 



^In addition, a specific-type name may be converted into 
another type in certain linguistic contexts. For instance, in 
a subsidiary of Mrs. Field, Mrs. Field is converted from a 
person name into a company name. 
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Table 2: Pronouns in Five Articles 



Table Q shows the system's performance in re- 
solving the core discourse anaphors in five ran- 
domely selected articles from the development set. 
Only five articles were examined here because the 
process was highly time-consuming. The perfor- 
mance for each expression type varies widely from 
article to article because of unexpected features of 
the articles. For instance, one of the five articles is 
a letter to the editor with a text structure drasti- 
cally different from news reports. On average, we 
see that the resolution accuracy (i.e., recall) was 
the highest for proper names (69%), followed by 
pronouns (62%) and definites (46%). There were 
not enough instances of refiexives to compare. 

Table |^ shows the system's performance for pro- 
nouns broken down by two parameters, gram- 
matical person and inter- vs. intrasentential 
antecedent. The system did quite well (78%) 
with third-person pronouns with intrasentential 
antecedents, the largest class of such pronouns. 

Part of the pronoun resolution performance here 
enables a preliminary comparison with the results 
reported in (1) [Lappin and Leass (1994 ) and (2) 
Kennedy and Boguraev (19961 ). For the third- 
person pronouns and refiexives, the performance 
was (1) 86% of 560 cases in five computer manu- 
als and (2) 75% of 306 cases in twenty-seven Web 
page texts. The present FASTUS system correctly 
resolved 71% of 34 cases in five newspaper arti- 



cles. This progressive decline in performance cor- 
responds to the progressive decUne in the amount 
of syntactic information in the input to reference 



resolution. To summarize the latter decline, Lap 



pin ai .d Leass (1994| ) had the following components 
in their algorithm. 

1. INPUT: fully parsed sentences with grammati- 
cal roles and head-argument and head-adjunct 
relations 

2. Intrasentential syntactic filter based on syntactic 
noncoreference 

3. Morphological filter based on person, number, 
and gender features 

4. Pleonastic pronoun recognition 

5. Intrasentential binding for reflexives and recip- 
rocals 

6. Salience computation based on grammatical 
role, grammatical parallelism, frequency of men- 
tion, proximity, and sentence recency 

7. Global salience computation for noun phrases 
(NPs) in equivalence classes (with seven salience 
factors) 

8. Decision procedure for choosing among equally 
preferred candidate antecedents 

Kennedy and Boguraev (1996) approximated the 
above components with a poorer syntactic input, 
which is an output of a part-of-speech tagger with 
grammatical function information, plus NPs recog- 
nized by finite-state patterns and NPs' adjunct and 
subordination contexts recognized by heuristics. 
With this input, grammatical functions and prece- 
dence relations were used to approximate 2 and 
5. Finite-state patterns approximated 4. Three 
additional salience factors were used in 7, and a 
preference for intraclausal antecedents was added 
in 6; 3 and 8 were the same. 

The present algorithm works with an even poorer 
syntactic input, as summarized here. 

1. INPUT: a set of finite-state approximations of 
sentence parts, which can be overlapping or dis- 
contiguous, with no grammatical function, sub- 
ordination, or adjunct information. 
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2. No disjoint reference filter is used. 

3. Morphological filter is used. 

4. Pleonastic pronouns are recognized with finite- 
state patterns. 

5. Reflexives simply limit the search to the current 
sentence, with no attempt at recognizing coar- 
guments. No reciprocals are treated. 

6. Salience is approximated by computation based 
on linear order and recency. No grammatical 
parallelism is recognized. 

7. Equivalence classes correspond to merged entity 
objects whose 'current' positions are always the 
most recent mentions. 

8. Candidates are deterministically ordered, so no 
decision procedure is needed. 

Given how little syntactic information is used in 
FASTUS reference resolution, the 71% accuracy in 
pronoun resolution is perhaps unexpectedly high. 
This perhaps shows that linear ordering and re- 
cency are major indicators of salience, especially 
because grammatical functions correspond to con- 
stituent ordering in English. The lack of disjoint 
reference filter is not the most frequent source of 
errors, and a coarse-grained treatment of reflexives 
does not hurt very much, mainly because of the in- 
frequency of reflexives. 

An Example Analysis 

In the IE context, the task of entity reference res- 
olution is to recognize referential links among par- 
tially described entities within and across docu- 
ments, which goes beyond third-person pronouns 
and identity of reference. The expression types to 
be handled include bare nominals, possessed nomi- 
nals, and indefinites, whose referential links tend to 
be more 'accidental' than textually signaled, and 
the referential links to be recognized include sub- 
set, membership, and part- whole. 

Consider one of the five articles evaluated, the 
one with the most number and variety of referential 
links, for which FASTUS's performance was the 
poorest. Even for the 'core' anaphoric expressions 
of 20 definites, 14 pronouns, and 7 names limited 



Expression Type 


Number of 


Correctly 




Occurrences 


Resolved 


Definites 


25 


10(40%) 


Pronouns 


14 


10(71%) 


Bare Nominals 


12 


3(25%) 


Proper Names 


7 


2(29%) 


Possessed Nominals 


6 


0(0%) 


Indefinites 


3 


0(0%) 


TOTAL 


67 


25(37%) 



Table 3: Referential Links in the Example 



to coreference, for which the code was prepared, 
the recall was only 51%. 

Figure |l| shows this article annotated with ref- 
erential indices. The same index indicates corefer- 
ence. Index subscripts (such as 4a) indicate subset, 
part, or membership of another expression (e.g., 
indexed 4). The index number ordering, 1,...,N, 
has no significance. Each sentence in the TEXT 
region is numbered with paragraph and sentence 
numbers, so, for instance, 2-1 is the first sentence 
in the second paragraph. 

Note that not all of these referential links need 
be recognized for each particular IE application. 
However, since reference resolution must consider 
all of the text for any particular application for the 
reason mentioned above, it is reasonable to assume 
that an ideal domain-independent reference resolu- 
tion component should be able to recognize all of 
these. Is this a realistic goal, especially in an IE 
context? This question is left open for now. 

Error Analysis 

Table ^ shows the system's performance in recog- 
nizing the referential links in this article, grouped 
by referring expression types. These exclude the 
initial mention of each referential chain. Notable 
sources of errors and necessary extensions are sum- 
marized for each expression type here. 

Pronouns: Of the four pronoun resolution errors, 
one is due to a parse error (American in 7-1 was 
incorrectly parsed as a person entity, to which 
she in 8-1 was resolved), that in 6-2 is a discourse 



deixis ( Webber, 1988 ), beyond the scope of the 
current approach, and two errors {it in 3-1 and 
its in 7-1) were due to the left-right ordering of 
intrasentential candidates. Recognition of par- 
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allelism among clause conjuncts and a stricter 
locality preference for possessive pronouns may 
help here. 

Definites: Of the fifteen incorrect resolutions of 
definites, five have nonidentity referential rela- 
tionships, and hence were not handled. These 
nonidentity cases must be handled to avoid er- 
roneous identity-only resolutions. Two errors 
were due to the failure to distinguish between 
generic and specific events. Token-referring def- 
inites (the union in 8-2 and the company in 9-1) 
were incorrectly resolved to recently mentioned 
types. Three errors were due to the failure to 
recognize synonyms between, for example, call 
(3-2) vs. request (3-1) and campaign (9-1) vs. 
strategy (9-1). Other error sources are a failure 
in recognizing an appositive pattern (9-1), the 
left-right ordering of the candidates in the pre- 
vious sentence (9-2), and three 'bugs'. 

Proper Names: Name alias recognition was un- 
usually poor (2 out of 7) because American was 
parsed as a person-denoting noun. The lower- 
case print in Patt gibbs also made it difficult to 
link it with Ms. Gibbs. Such parse errors and 
input anomalies hurt performance. 

Bare Nominals: Since bare nominals were not 
explicitly resolved, the only correct resolutions 
(3 out of 12) were due to recognition of ap- 
positive patterns. How can the other cases be 
treated? We need to understand the discourse 
semantics of bare nominals better before devel- 
oping an effective algorithm. 

Possessed Nominals: A systematic treatment of 
possessed nominals is a necessary extension. The 
basic algorithm will look like this — resolve the 
possessor entity A of the possessed nominal 
A's B, then if there is already an entity of type 
B associated with A mentioned anywhere in the 
preceding text, then this is the referent of B. Pos- 
sessed nominal resolution also requires a 'syn- 
onymy' computation to resolve, for example, its 
battle (9-1) to its corporate campaign (7-1). It 
also needs 'inferences' that rely on multiple suc- 
cessful resolutions. For instance, her members 
in 8-1 must first resolve her to Ms. Gibbs, who 



HEADLINE: American Airlinesi Calls for Mediatioms In Itsi Union4 Talks2 

DOCUMENT DATE: 02/09/87 
SOURCE: WALL STREET JOURNAL 

1- 1 Amr corp.'s American Airlinesi unit said iti has called for federal mediationis in itsi contract talks2 with unions4 

representing itsi pilotsis and flight attendantsiy. 

2- 1 A spokesman for the companyi said Americani officials "felt talks2 had reached a point where mediationis would be 

helpful." 

2- 2 Negotiations2a with the pilotsie have been going on for 11 months; talks26 with flight attendantsir began six months 

ago. 

3- 1 The presidents of the Association of Professional Flight Attendants4a, which represents Americani 's more than 

10,000 flight attendantsi7, called the requests for mediationis "premature" and characterized its as a bargaining 
tactic that could lead to a lockout. 

3- 2 Patt gibbss, presidents of the association4o, said talks26 with the companyi seemed to be progressing well and the 

calls for mediationis came as a surprise. 

4- 1 The major outstanding issue in the negotiations26 with the flight attendantsiT is a two-tier wage scale, in which recent 

employees' salaries increase on a different scale than the salaries of employees who have worked at americani for a longer 
time. 

4- 2 The union4a wants to narrow the differences between the new scale and the old one. 

5- 1 The companyi declined to comment on the negotiations26 or the outstanding issues. 

5- 2 Representatives for the 5,400-member Allied Pilots Association46 didn't return phone calls. 

6- 1 Under the Federal Railway Labor Act, [if the mediatorisa fails to bring the two sidesr together and the two sidesy 

don't agree to binding arbitration, a 30-day cooling-off period followsje- 

6- 2 After thate, the unionya can strike or the company7a can lock the unionya out. 

7- 1 Ms. GibbsB said that in response to the companyi 's move, hers union4a will be "escalating" its4a "corporate campaign" 

against Americani over the next couple of months. 

7- 2 In a corporate campaignio, a uniong tries to get a companys's financiers, investors, directors and other financial 

partners to pressure the companys to meet uniong demands. 

8- 1 A corporate campaignio, shes said, appeals to hers membersir because "itio is a nice, clean way to take a job action, 

and our4a womeniy are hired to be nice." 

8- 2 the union4a has decided not to strike, shes said. 

9- 1 The union4a has hired a number of professional consultantsi4 in its4a battleis with the companyi, including Ray 

Rogersi4a of Corporate Campaign Inc., the New York labor consultanti4o who developed the strategyi2 at Geo. 
A. Hormel &: Co.is's Austin, Minn., meatpacking plant last year. 

9-2 That campaigni2, which included a strike, faltered when the companyis hired new workers and the International 
Meatpacking Union wrested control of the local union from Rogersi4a' supporters. 



Figure 1: Example Article Annotated with Referential Links 
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is president of the association, and this 'associa- 
tion' is the Association of Professional Flight At- 
tendants. After it is understood as 'the members 
of the Association of Professional Fhght Atten- 
dants,' the corcfcrcnce with the flight attendants 
can be inferred. Similarly for our women in 8-1. 

Indefinites: Some indefinites are 'coreferential' to 

generic types, for example, a corporate campaign 
(7-2, 8-1). Given the difficulty in distinguishing 
between generic and specific event descriptions, 
it is unclear whether it will ever be treated in a 
systematic way. 

Conclusions 

In an operational end-to-end discourse understand- 
ing system, a reference resolution component must 
work with input data containing parse errors, lex- 
icon gaps, and mistakes made by earlier reference 
resolution. In a state-of-the-art IE system such 
as SRI's FASTUS, reference resolution must work 
with considerably impoverished syntactic analysis 
of the input sentences. The present reference res- 
olution approach within an IE system is robust 
and efficient, and performs pronoun resolution to 
an almost comparable level with a high-accuracy 
algorithm in the literature. Desirable extensions 
include nonidentity referential relationships, treat- 
ments of bare nominals, possessed nominals, and 
indefinites, type-token distinctions, and recogni- 
tion of synonyms. Another future direction is to 
turn this component into a corpus-based statisti- 
cal approach using the relevant factors identified 
in the rule-based approach. The need for a large 
tagged corpus may be difficult to satisfy, however. 
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